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ABSTRACT 
Pattern recognition and classification systems have been under dee 
velopment for several years. This paper examines one of these systems, 
which has been called an adaptive linear neuron, to determine how the 
desired classification is achieved and how this system might be used in 
the practical field of character recognition. Specifically, the follow- 
ing ideas are discussed in this paper: 

(1) The basic concepts of linear separability and iterative adap- 
tion by an adaptive linear neuron (Adaline), as applied to the 
pattern recognition and classification problem. 

(2) Four possible iterative adaption schemes which may be used to 
train an Adaline. 

(3) Use of Multiple Adalines (Madaline) and two logic layers to 
increase system capability. 

(4) Use of Adaline in the practical fields of Speech Recognition, 
Weather Forecasting and Adaptive Control Systems and the possible 


use of Madaline in the Character Recognition field. 
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1. Introduction. 

Systems which can be trained to classify complex digital and analog 
patterns have been under development for several years. One such system 
has been proposed by B. Widrow and others at Stanford University [1]. 

In this system, pattern recognition and classification are accomplished 
through the use of integrative memory cells, each of which consists of a 
variable resistor whose value can be made, by suitable "training", to 
become a useful function of the experiences of the device itself. The 
training of these devices is essentially a process of iterative adaption. 
In general, training consists of a systematic adjustment of each of the 
memory elements of the device in such a way that the system is forced to 
produce a predesignated output to each of a number of specific inputs. 
After the device has successfully adapted to a large number of training 
patterns, it has the ability to classify inputs which are related to the 
training patterns, as well as the training patterns themselves. 
Characteristics of the trained system are the effectively instantaneous 
classification of pattern inputs and the diffuse storage of information 
throughout the memory elements. The basic system developed by the Stanford 
group is an adaptive threshold device called an adaptive neuron, or 
Adaline, an acronym of adaptive linear neuron. A discussion of the 
functions performed by this element will be presented later. Systems 
containing one or more of these devices are being developed for use in 
Speech recognition, weather forecasting and electrocardiogram analysis. 
They have also been used as trainable controllers in adaptive control 
systems. 

In this paper it is planned to review the progress made by the 


Stanford group in the field of pattern recognition and classification. 


Specifically, a close examination will be made of an Adaline, the basic 
ideas behind its use, the procedure followed in training it and the 
results obtained using a trained System. The investigation will be 
prosecuted experimentally by testing digital computer simulated models. 
In addition, a similar mathematical model will be applied to the 
practical problem of classifying the ten characters zero through nine 
when these are in a hand printed form. Finally, a start will be made on 
the problem of identifying a set of 45 alphaenumeric characters, also in 


hand printed form. 





2e The Basic Adaptive Neuron. 

The basic adaptive neuron, or Adaline, developed by the Stanford 
group is shown in figure l. The input p.*s are binary, but for 
convenience they are required to take the values of +l, rather than 
the more conventional +l and Q. An input pattern is defined to be a 
particular set of values of the P,'Se In the Adaline each P, is 
multiplied by a corresponding (analog) weight, Ws» which can be ree 
garded as the current setting of the yeh memory cell, and which can 
take both positive and negative values. The weight Wo is called the 


n 
threshold weight. Thus the analog output of the summer is 2 PY, + he 


n i=] 
and the digital or quantized output is sen! & + wos 
| 11 


) 






Summer Quantizer 





Fige 1. Basic Adaptive Neuron 











3. Linear Separability. 

For simplicity, a two input Adaline will be discussed first. 
Suppose that it is desired to classify the possible input patterns into 
two classes: 

(a) one or both Pp, positive, and 

(b) both P; negative. 
In this particularly simple case it is easy to see that if Wo = W = 
Wy = 1, then the corresponding digital outputs will be +l and -l. It 
will be said that this setting of the weights has classified the patterns 
into two classes, (a) with a +l output, and (b) with a -l output. In 
more complicated cases, the values of the weights would be adaptively 
determined during training. However, is such classification always 
possible? Suppose it is desired to classify the input patterns into 
the two classes: 

(a) both P; positive, or both P, negative, and 

(b) p,'s with alternate signs. 
In this case it is not possivle to find a set of w, which will yield a 


digital output of +l for the first class of input and =1 for the second. 


p Pp 
A‘? 2 








wb alr 3 a Das 
Fige 2. Separation Fig. 3. Not Separable 
Into Two Output Classes By One Adaline 








It will then be said that the patterns are not "linearly separable" into 
these two classes and that a classification cannot be affected with a 
Single Adaline. 

The same thing can be illustrated graphically as in figures 2 and 3. 
The analog output must change sign as the line = iM, : Wo = 0 is 
crossed, and it can be seen that the analog sience above the line in 
figure 2 is positive, and that it is negative below the line. Since a 
line can be drawn (actually an infinite number of lines can be drawn, 
each with different w,) Separating the a and b pattern classes, it can be 
concluded that the patterns are linearly separable into these two 
Classes. On the other hand, it is immediately obvious that the classi- 
fication of figure 3 is not possible. 

It is of interest to continue the classification of the possible 
input patterns into different groups of two classes until all possible 
combinations have been examined. Using this procedure, both the number 
of linearly separable and the number of not linearly separable examples 
may be determined for a two input Adaline. The digital outputs for the 
two classes will be: 

Class (a) +l, and 

class (b) el. 
Table 1 contains eight examples illustrating this concept of linear 
separability. 

It can be seen that examples 1 thru 6 in the table could be repeated 
with the desired digital outputs reversed, i.e., with a ol output for 
Class (a) and a +1 output for class (b). This procedure would double 
the number of linearly separable classifications, and with the addition 


of the case illustrated as example 7 and its counterpart (for all inputs 
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Table l. Separable and Not Separable Examples For Two Binary Inputs 


Desired Classification Graphical One Set of w, For 

of Input Patterns Representation Linear Separation 
(a) One or both p, + Wy = 1 
(b) Both Pi > wo l 
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to be classified as (b)), would yield a total of fourteen examples which 
are linearly separable. The interchange of the +l and -l of example 8 
would yield a second example which is not linearly separable. Thus the 
total number of separable and unseparable classifications into two | 
classes is equal to 16. 

Suppose now that the number of binary inputs to an Adaline is 3 
rather than 2, and that it is desired to classify the input patterns 
into the two classes: 

(a) pars +1, Po = Fi¥ pam +1; and 


3 


(b) Pa = el, Po Tiel P. = +1. 


If the weights are chosen to be ae 1 and a a = ss *, a, its 
easily seen that the digital output will be +1 for all the patterns in 
class (a), and -1 for those in class (bd). Thus separation has been 


achieved by the plane Py = 0. 


Table 2. Numoer of Linearly Separable Classifications 
for Different Numbers of Binary Inputs 


Number of Maximum Possible Number Number of Known Linearly | 


n 
Binary Inputs (n) of Examples OSs ) Separable Examples 


4 
14 
104 
1,882 
94,572 


15,028,134 











In general, if there are n binary inputs to an Adaline, the two 
classes will be separable if the W, can be chosen in such a way that the 
n-l dimensional hyperplane w)p, + WOP5) t = =2ehaeepEet we O separates 
the two classes in nespace. 

It was shown above that, for an Adaline with two binary inputs, 
the total number of input-output situations was 16. An extension of 
this reasoning from 2 to n binary inputs indicates that there are 22" 
possible examples, again including both linearly and not linearly 
Separable types. Table 2, which was extracted trom {2], lists this 


figure together with the number of known linearly separable examples, 


for n from 1 to 6. 








4. Threshold and Dead Zone. 

Linearly separable input patterns cannot normally ve Separated 
unless the weights are carefully chosen. The process of changing the 
weights until each weight has the value most Likely to affect the linear 
separation of the input patterns is called adaption or training. 


One of the weights adjusted during adaption is w,, the threshold 


0 
weight. First, in relation to the binary or quantized output, we see 
that the output changes sign when Y Py =, and the threshold can 
therefore be regarded simply as a fies on the quantizer, as shown in 
figure 4. Alternatively, if we consider the geometry of the hyperplane, 
it can be seen that a change of the threshold from Wo to Wo! affects a 
shift of the hyperplane to a new location parallel to its original 


position, as illustrated in figure 5 for n = 2. In fact, the 


perpendicular distance of the hyperplane from the origin, d, will be 


“0 - It should also be noted that the slope of the 
2 2 2.°5 
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hyperplane is not a function of the threshold, but instead is determined 


by the settings of the other weights. 





ia | Hyperplane 
220 for wo' 
Wo sie 
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Fige 5. Separating 
Lines as a Function 
of Threshold Weight 

For the situation depicted in figure 6, any of the three hyperplanes 


determined by the thresholds Wo» W ' and cre will linearly separate the 


0 

input patterns into the two desired output classes a and bo. However, 

use of the Wo! or Wo" hyperplanes would not provide the safety that 

might be needed in a practical situation. Some of the factors that 

might cause classification errors are resolution of the quantizer and 

drift in the circuit components. As a result, the input to the quantizer 

must always be sufficiently different from zero so that a small change 

in the quantizer input after adaption will not yield an incorrect digital 

output. In addition, the problem of resolution and drift is magnified 

by the fact that, as the number of binary inputs increases, the tolerance 

required on each weight becomes more critical. This is discussed in [3]. 
Now what can be done to decreaSe the effect of such errors on the 


Adaline system? One technique would be to utilize the Wo hyperplane 


as the separating plane, with the Wo" and ay hyperplanes as the outer 
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Fig. 6. Buffer and Dead Zone 

limits of a buffer zone about the separating plane. The system would 
then be trained so that all possible input patterns would be classified 
into classes above and below the buffer zone, with a consequent reduc- 
tion of the danger of producing an incorrect output from the quantizer. 
For convenience, each of the spaces on either side of the separating 


plane will be called a “dead zone" as labeled in figure 6. 


ll 





5. Multilevel Adaline. 

It is often necessary to separate the input patterns into more than 
two classes. In some cases this can be done effectively with a multi- 
level Adaline, which separates the input patterns into classes according 
to the analog output (level) assigned to each pattern. For this type of 
Separation to be possible the input patterns or Sets of input patterns 
must be linearly separable in a specific fashion. The Several assigned 
output levels may be regarded as equivalent to a corresponding set of 
threshold weights. These in turn define a set of parallel hyperplanes 
such as shown in figure 6 for Wo» Wo’ and Waite Therefore, the use of the 
multilevel Adaline is restricted to the classification of inputs which 
can be linearly separated from each other by a series of parallel hyper- 
planes. 

A typical use of a multilevel Adaline would be to classify the 
written digits one through nine. As a simple test of this concept a 
Computer simulated Adaline was used to separate one sample of each of 
these digits. The written digits were converted into patterns using a 
seven by seven input Space. After training, all nine patterns were 
correctly classified showing that this particular set of patterns was 


separable in this manner. 
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6. Adaline Adaption. 

The previous sections contained a discussion of Adaline fundamentals 
and the concept of separating pattern classes by the appropriate setting 
of the weight elements. Training or adaption is the process of iter- 
atively determining suitable values for the weight elements. This 
section will formulate the adaption problem and then consider four 
possible adaption or training procedures. 

Adaline application in a pattern recognition problem involves the 
following sequence of events. Firstly, training patterns are chosen and 
assigned digital outputs, i.e. plus or minus one. Then an adaption 
process is selected and Adaline is trained until it either correctly 
identifies all the patterns, or until the training is terminated due to 
a failure of the training scheme to converge. If the training was not 
so terminated, the Adaline would now have the capability to correctly 
identify all the training patterns (thus proving the pattern classes to 
be separable) and in addition, the ability to recognize a number of 
"Similar" patterns. In fact, one of Adaline'’s main advantages is the 
ability to identify patterns it has not been trained on. Other methods 
of pattern recognition such as "table look up" do not generally have this 
capability. 

One question now to be answered is, what constitutes a "similar" 
pattern? One measure of the similarity of two patterns would be the 
number of pattern elements that would have to be changed in one pattern 
to make the two patterns identical. Figure 7? displays three patterns, 
each containing 16 pattern elements. Pattern (a) differs from pattern 
(b) by one pattern element, while pattern (a) differs from pattern (c) 


by eight pattern elements. Patterns (a) and (b) would be classified as 


13 





"similar" patterns while patterns (a) and (c) would be classified as 
dissimilar patterns if the above criterion were used. In some cases, 
however, a pattern might be regarded as "similar" to a training pattern 
if it were related to it, for example, by a simple rotation or transla- 
tion. In that case patterns (a) and (c) would be regarded as similar. 
This criterion might be used if the Adaline was being specifically 
trained to recognize patterns even when translated or rotated from their 


normal configuration. 


X X X © XXX o 2 6 ells 
eX cou: yoo o X X X 
o Kae o X eo e o o KX o 
0 0 @ 0 °o 0 0 0 o o X eo 
(a) (b) (c) 


Fig. 7. Similar and Dissimilar Patterns 

Finally, a "similar" pattern is often generated by the contamination 
of a training pattern by noise. For this reason "similar" patterns are 
Sometimes referred to as "noisy" patterns. The ability of Adaline to 
recognize similar patterns is clearly an extremely complex function of 
the total number of pattern elements in the input pattern, the number 
and complexity of the training patterns, the value of the weight 
elements, and the degree of similarity of the input pattern to a train- 
ing pattern. 

It was shown earlier that the analog output is imposed ona 
quantizer whose digital output is either plus or minus one. It is 
usually desirable to endow the quantizer with a dead zone, so that some 
finite magnitude of the analog output is required before the digital 


output becomes nonezero. During the training or adaption phase the 
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analog output of each training pattern is made equal to or greater than 
an "adaption dead zone". After training, a smaller value of dead zone 
may be introduced which will often enhance the probability of correctly 
identifying similar patterns. Thus patterns will be classified accorde- 
ing to the relationship of their analog outputs to a "recognition dead 
zone"! 

The adaption or training problem can be formulated as3 

Givens: A collection of patterns and the desired digital 
output for each pattern. 

Finds The value of the weight elements such that the 
expression Wo + > im, yields the appropriate analog 
output for each — patterne 

The weight values are usually determined by an iterative process 
based on a comparison of the actual analog output for each training 
pattern with the corresponding desired analog output. In other words, 
an input pattern is imposed on Adaline, the analog output is examined, 
the weight elements are changed if required, and then another pattern is 
imposed. This process is continued until the analog output of each 
pattern is acceptable without further adaption, or until it is determined 
that the process will not converge. The adaption process will be defined 
as having converged when each training pattern generates an acceptable 
output. -It should be noted that an adaption process may not converge to 
a solution even though the pattern classes themselves are linearly 
separable. In such a case a different adaption technique would have to 


be employed. 
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7. Specific Adaption Schemes. 
Common to each of the adaption schemes discussed below are the 
following rules. 
When an analog output is unacceptable; 
1. Change each of the weight elements by the same absolute 
magnitudes 
20 Increase or decrease each weight according to whether the 
product of the corresponding pattern element by the desired 
change of the analog output is positive or negative. For 
example, if the analog output is to be increased and Py 
is el, then oe would be decreased. 
A description of the four adaption schemes listed in [4] now follows. 
Minimum Square Error Adaption 
This process adjusts the weight elements in such a manner that the 
analog output for each pattern is driven toward the same absolute magni-c 
tude, called the adaption dead zonee The error at each step is defined 
as the difference between the adaption dead zone and the actual analog 
output, and the change made to each weight element is expressed by the 


Following equations: 


A»; | = Error x Proportional Constant 
Total Number of Weights 


If the proportional constant is greater than zero and less than two, 
the process will converge provided that the patterns are both linearly 
Separable and meet an additional requirement that will be discussed 
later. In the case where the weight elements are continuously variable, 
it has been experimentally determined (Appendix LIL), that the number of 
iterations required to converge in a typical situation approaches a 


minimum when the proportional constant is approximately equal to one. 


16 








The criterion requiring identical magnitudes for the analog outputs 
for all input patterns causes an extremely large number of iterations. 
Therefore, a tolerance is usually established around the adaption level, 
defined as minimum Square error bound, and no changes are made to the 
weight elements if the analog output falls within this bound. 

This adaption scheme also suffers from the disadvantage that there 
is no assurance of convergence, even if the pattern classes are known to 
be linearly separable. If the number of input patterns is equal to or 
greater than one plus the number of weight elements, there is a possie 
bility that the adaption will not converge. The two pattern space, 
considered in section 3, will be used to illustrate this limitation. It 
can be seen in Figure 8 (a) that the classes can be separated in such a 
way that the analog output has the same magnitude for each pattern. This 
is not true in the case of the configuration of Figure 8 (b), and the 
minimum Square error procedure, as described above, will not converge 
even though the classes are linearly separable. 

The next three adaption procedures have been proved to converge to 
a solution provided that the pattern classes are linearly separable. 
These schemes compare the adaption dead zone value with the actual analog 
output and, if the magnitude of the analog output is less than the adap- 
tion dead zone value, some method will be employed to adjust the weights 
in such a way as to increase the analog output magnitude. However, if 
the analog output magnitude is equal to or greater than the adaption dead 


zone, no Changes are made. 
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Fig. 8. Separation With Minimum Square Error Procedure. 
Incremental Adaption 
When the analog output magnitude is less than the adaption dead 


zone value, all the weight elements are changed by an amount: 


Aw: | = Incremental Constant x Adaption Dead Zone 
| | Total Number of Weight Elements 


The number of iterations required to converge, and the final value of the 
weight elements, is a function of the incremental constant. Note that 


the corrections are independent of the errors. 
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Relaxation Adaption 


This procedure is similar to the minimum square error adaption 
except that no corrections are made when the analog output magnitude 
is equal to or greater than the adaption dead zone. The adaption proe- 
cess will converge if the proportional constant is between zero and two. 


The weight elements are changed, if required, by an amount: 


\A | 2 (Adaption Dead Zone = Analog Output) x Proportional Constant 
le Total Number of Weight Elements 


Modified Relaxation Adaption 

A disadvantage of the minimum Square error and relaxation adaption 
procedures is the large number of iterations required for convergence. 
When the difference between the desired analog output and the actual 
analog output is small, a small correction is made. Thus, the closer 
the process gets to the solution, the smaller the magnitude of the 
corrective changes. The modified relaxation adaption procedure overe 
comes this difficulty by correcting to a value larger than the adaption 
dead zonee This value, usually 1.1 to 1.5 times the adaption dead zone 
magnitude, is defined as the adaption level. No corrections are made if 
the output magnitude exceeds the adaption dead zone. The equation for 
the change of weight elements, where the proportional constant should 


again be between zero and two for convergance, is? 


LA w | . (Adaption Level - Analog Output) x Proportional Constant 
i 


Total Number of Weight Elements 
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8. Evaluation of Adaption Procedures. 

The different characteristics of the adaption procedures were ine 
vestigated by the solution of some typical problems. In all cases, 
Adaline was simulated on a CDC 1604 Digital computer using one of the 
adaption procedures which are defined by the Fortran statements listed 
in Appendix I. In one example the adaption process is examined after 
each iteration, (Appendix II), while in another a test is described of 
Adaline's ability to correctly identify a number of patterns upon which 
it had not been trained (Appendix IV). 

If training patterns are not separable, the training process will 
not converge and must be arbitrarily terminated. The Adaline will then 
fail to identify all of the training patterns ere but if the 
number of failures is small this may be tolerable in some applications. 
When it comes to the recognition of noisy versions of the training 
patterns, it must be expected that Adaline will only recognize a stae 
tistical percentage of the similar patterns presentede The only method 
of ensuring that Adaline will correctly identify all possible input 
patterns is to train on all the conceivable patterns. But, this is 
then a form of "table look up” recognition, which can be performed by 
other means without the necessity of employing an iterative scheme. 
Appendix V summarized the results of training Adaline on pattern classes 
that are not separable. 

No attempt will be made to promote the use of one adaptive procedure 
in lieu of the others. It can be noted that the modified relaxation 
procedure usually requires the fewest number of iterations to converge, 
but results in a wide spread in the analog output valuese On the other 
hand, the minimum square error adaption requires more iterations to cone 
verge but has a narrow spread in the analog output values. 
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9. Possible Adaline Applications. 

The output of a trained Adaline can be regarded as a binary digit, 
or logical decision, whose value depends upon the pattern input to the 
device. [It follows, therefore, that an Adaline can, in principle, be 
applied in any situation where a decision is to be made on the basis of 
Some “input" to the decision making device. In particular, the Adaline 
concept is valuable in situations where performance Can or must be 
improved as experience accumulates. 

The basic difficulties in the application of Adalines relate firstly 
to the problem of converting an input into a pattern, and secondly to the 
development of a Suitable training procedure which will ensure that the 
Adaline does in fact improve its performance with practice. The following 


three sections will consider a few of the possible Adaline applications. 


Jat 








10. Servo-Mechanism Controller. 
A Single Adaline with its digital output can be used as a bang-bang 


servoemechanism controller as in figure 9. 












Adaline 
Controller 


2, A 

Figs 9. Basic Adaline ServoeMechanism Controller 

Graduate students at Stanford University have used an Adaline in 
such a control problem. The plant consisted of a rolling cart powered by 
a reversible electric motor. Installed on the cart was an inverted 
pendulum. The Adaline controller was trained to keep the pendulum in a 
vertical position without extreme excursion of the cart in either direc- 
tion. Four plant variables were measured; the position and velocity of 
the cart and of the pendulum. 

The direction the electric motor should rotate, and thus the deSired 
Adaline digital output, is a function of the four measured variables. 
The value of each variable can be cataloged into one of several distinct 
levels, and each level can in turn be represented by a code consisting of 
a series of pattern elements. These pattern codes must be carefully 
chosen to ensure that the pattern classes are linearly separable. The 
complete input pattern is composed of the pattern elements of the four 
variables. 

The Adaline is trained by permitting it to observe the performance 
of another type of controller. The “correct" response is then available 
at all times and the Adaline weights can be adjusted to bring the Adaline 


output into agreement. After the training is completed, the Adaline can 


take over the operation of the plant. 
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ll. Speech Recognition. 

A real time speech recognition system [5| has been constructed at 
Stanford University. The system, which consists of several parallel 
Adalines, has the capability of converting speech into typewritten words. 
Since the operation of parallel Adalines will be discussed in a later 
section, only the coding technique will be discussed here. 

The main problem encountered in coding was the choice of parameters 
to describe the sound of a spoken word. Bandpass filters were employed 
to separate the sound energy into eight frequency bands and the sound 
intensity in each band was then digitally coded according to the ampli- 
tude level. Four levels were chosen corresponding to the three bit 
patterns, 000, 001, O11, or 111, where 1 equals +1 and 0 equals -1. 

' Ten samples were taken during the utterance of a word, so that each 
filter generated 30 pattern elements. The complete pattern from all 


eight filters therefore consisted of 240 pattern elements. 
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12. Weather Forecastin2. 

Adalines have been applied in the area cof weather forecasting to 
the extent of predicting "fair" or “rain" in one locality [6]. In 
this application parallel Adalines were trained to interpret weather 
mapse In particular, they were trained to "read" surface pressure maps 
of a 500,000 square mile area. 

The weather map was divided into 48 regions each of approximately 
600 square miles. Then, the expected range between the highest and 
lowest presSures was divided into ten levels, each of which was repre- 
Sented by one of the ten digits, 0 through 9. Thus, each input pattern 
contained 48 pattern elements each of which could acquire one of ten 
values (as compared with the usual two). 

The results obtained from the Adalines were comparable with those 


obtained from "human" weather forecasters. 
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13. Classification When Pattern Classes Are Not Linearly Separable. 
Previous sections have discussed the basic structure of an Adaline 

and have detailed several schemes for the use of an Adaline in pattern 
recognition and classification. The discussion thus far has been limited 
to the Separation of sets of input patterns into classes by a single 
Adaline. It was found that the classification of input patterns into 
two classes could not always be affected with a single Adaline. This 
situation was illustrated by the not linearly separable example shown in 
figure 3. In that example it was desired to classify all input patterns 
into the two classes; 

(a) both p,; positive, or both p, negative, and 

(b) p,'s with alternate signs 
This separation can be accomplished by a system using two Adalines in 
parallel. The weights of the two Adalines define two hyperplanes, which 
affect the desired separation as illustrated in figure 10. The overall 
system consists of two logic layers, the first layer being composed of 
Adalines with adaptive elements and the second layer made up of a fixed 


logic element or threshold device. In the first layer each Adaline 


PF? 







a 
© 


Hyperplane from 
Adaline One 


Hyperplane from 
Adaline Two 
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Fig. 10. Linear Separability 
With Multiple Adalines 
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attempts a classification of input patterns into linearly separable 
classes while the second layer combines the outputs of the first layer 
to complete the desired classification. Using the example of figure 10, 
the inputs would be classified in the first logic layer as follows: 
Adaline One a) both p,; positive 
b) all others 
Adaline Two a) both Pp; negative 
b) all others 
This classification would place the two hyperplanes as in figure 10. 
The next step would be to insert the Adaline outputs into the second 
l 


layer which is an “or gate (in this case) in order to realize the 


desired output classification. This process is summarized in table 3. 


A SR RG a A a et I, i DS 


Table 3. Separation Using Two Logic Layers 
Inputs Outputs 
Adaline #1 Adaline #2 Or Gate 
] 1 1 -l 
1 -l -1 -1 
-1 l ~1 -1 
-l -l =| 1 





It is apparent that the choice of weights (hyperplanes) in the 
first level Adalines is dependent upon the logic device used in the 
second layer and vice versa. It follows that the choice of a logic 
device and the establishment of a training procedure for the Adalines 
may prove very difficult if the result is not known in advance, as it 
was here. In fact this would seem to be the major difficulty in this 


1 
An "tor" gate is a device which gives a poSitive output if one or 
more of its n inputs are positive. 
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scheme. One approach to this problem now follows. 

There are many devices which could be used in the second layer to 
combine the Adaline outputs, so that it is important to estimate which 
of these devices is likely to be the best one for this purpose. The 
previously mentioned "or" element can be regarded as a special quantizer 
whose output is -l unless at least one Adaline output is positive. At 
the other extreme would be an element whose output is -1 unless all n 
Adaline outputs are positive. 

In trying to find the "best" quantizing device for use in the second 
logic layer it seems natural to try a device whose output becomes +l when 
about n of the Adaline outputs become positivee It has been found, ie, 
that in a system with an odd number of Adalines, a simple output 
majority, or a second layer "threshold" of n+l, will realize the 
Classification of the greatest number of _—— Similarly for an even 
number of Adalines, second layer thresholds of n or n+2 will realize the 
highest percentage of classifications. It — be <_< that the 
criterion used in the above reference to determine the "best" second 
layer threshold for general uSe was the criterion of the classification 
of the maximum number of input pattern sets. However, for any specific 
patterns, or sets of patterns, the threshold which is "best" might be 
anywhere from 1 to ne 

There is still the problem of choosing the "best" adaption scheme 
for the first layere One method that has been suggested makes use of 
both the analog output data from each Adaline and the digital output 
desired from the overall system. In the case where the second layer 


element is an “or" sate, the procedure to be followed will depend on 


the desired system output. Thus, if the desired system output is -l, 
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all Adalines must give a negative output and any Adaline which is giving 
a positive output will have to be adapted. However, when all first layer 
Outputs are negative and the desired system output is +1, only one 
Adaline need by adapted. In this case it has been suggested that 
adaption be confined to the Adaline whose analog output needs the smalle 
est change to establish the required condition. If the second layer 
threshold is a majority logic device, the same general procedure will be 
followed. If, to obtain the desired system output, the output of at 
least k Adalines must be revised, the k Adalines whose outputs require 
the least change will be adapted. The idea behind this procedure is that 
adaption should take place with the minimum of disturbance to the pree- 
viously established pattern of weights. 

It may be noted that if it were not for the difficulty in choosing 
a suitable logic for the second layer, and a training procedure for the 
first, it would be theoretically possible to establish any desired 
classification if sufficient Adalines were employed in the first layer. 
In an extreme case, for example, the ntl hyperplanes defined by the 
weights of n+l Adalines could separate one input from all others in 
n-space if the weights could be chosen properly, and if the second logic 


layer properly interpreted the outputs of the Adalines. 
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14. inadaline. 

Multiple sdalines in parallel, or a Madaline as tiis combination is 
sometimes called, may also be used to classify a set of input patterns 
into more than two output classes. If the input patterns can ve 
appropriately separated by each of the Adalines, Madaline can accomplish 
the desired classification with the help of a digital output coding 
scheme such as that illustrated in figure 1l and table 4. It is easily 
seen that the maximum number of individual output codes available is pee 
where m is the number of Adalines in the Madaline. AS m = 3 in figure 1l 
and table 4, it is readily apparent that eight input patterns or pattern 
Sets can be classified into eight different digital output combinations 
in this situation. 

Here the coding scheme is predeterinined, so that the trainin; 
follows the procedure devised for a single Adaline. That is, each 
Adaline is trained individually to generate the appropriate response to 
each of the training patterns. 


The problem which may arise is related to the choice of codes for 


the several pattern sets. If these are not properly chosen, then the 


Nea hicite +1 
+ w= output 
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output code 


Desired Output Fige 11.5 Madaline Output 
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individual Adalines may well de attempting to separate the patterns into 
Classes which are not linearly separable. A change in the codes may 
eliminate the problem in any given situation, but no systematic pro- 
cedure for choosing the coding scheme has been found. 

Also to be considered is the fact that the coding scheme may affect 
the number of Adalines needed, and/or the number of adaptions necessary 
during training. These considerations, however, seem to be less ime 
portant than that of the previous paragraph. 

For the example of figure 11 all eight possible binary output codes 
are used. Note, however, that there are many possible choices for the 
code to be assigned to each pattern or pattern set. One such choice is 


shown in table 4. 


Table 4. Typical Madaline Output Coding Scheme 















Adaline 2 Adaline 3 
Output Output Output 
pattern 1 =] 1 
pattern 1 =] 
pattern =] = 1 
pattern l 1 
pattern l =] 
pattern o1 1 
pattern l 1 
pattern -l : o1 


a 


To summarize, each of the Adalines is trained to separate the input 
patterns into two classes (a) +1 output and (b) =1 output. Then, as an 
input pattern is impoSed on the Madaline, the Adalines simultaneously 
determine their output responses. The outputs can then be examined to 


properly classify the imposed pattern. The classification of the 
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written digits, one through eight, suggested herein was successfully 


accomplished using three computere-simulated seven by seven Adalines. 
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15. Character Recognition 

In this section the application of a Madaline to the recognition and 
Classification of hand printed letters, numbers and other punctuation 
symbols will be discussed. For definiteness, the alpha-numeric character 
set chosen corresponded to that used in Fortran computer coding, and the 
aim was to evaluate the possibility of “reading" hand printed Fortran 
symbols using a Madaline. 

The investigation was divided into two parts. The first consisted 
of an evaluation of the feasibility of classifying the ten numeric 
characters using a Madaline, while the second consisted of an attempt 
to classify 45 Fortran characters using a similar device. 

In the first part of the investigation the Madaline consisted of 
four Adalines, the minimum number required, each with a seven by Seven 
input space. The first tests were conducted using patterns of ten digits 
obtained from five different persons, each of whom wrote one sample of 
each of the ten digits on a standard Fortran coding form. The written 
digits were enlarged by projection on a screen and a Seven by seven grid 
form was used to determine the actual input patterns used. Each test 
digit was centered on the seven by seven grid form and a digit was de- 
fined to be "tin" a grid space if it entered in such a manner that both 
sides of the line could be seen inside that space. The tests, which are 
detailed in Appendix VI, were conducted using the patterns obtained in 
this mannere In one of these tests, the Madaline was trained four times 
on each of the fifty different input patterns and was then asked to 
classify each input pattern. All input patterns were classified 
correctly. 


The second part of this investigation was again conducted with the 
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minimum number of Adalines requirede Six Adalines, each with a seven by 
Seven input space, were used to classify the 45 Fortran symbols. The 


results of this classification are detailed in Appendix VII. 
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16. Summary and Acknowledgements. 

This paper contains a survey of the pattern recognition problem as 
approached through the application of adaptive linear neuron devices. 
Some of the basic Adaline concepts were verified experimentally and some 
concepts were amplified. In particular, an example was shown of the 
inability of the minimum square error adaption procedure to separate ine 
put patterns that were linearly separable, and a method was developed 
to display the convergence characteristics of the individual adaption 
procedures. In addition, preliminary tests to determine the feasibility 
of utilizing a Madaline for character recognition indicated that a pos- 
Sible application exists in this field. 

An interesting project, which is a natural follow up of the work dee 
tailed herein, would be that of using an actual laboratory set up to more 
completely determine the feasibility of utilizing Madaline for character 
recognition. Perhaps this could be accomplished using photocells for 
input pattern detection. Another carry over project using Adalines would 
be to continue the weather forecasting analysis previously discussed [6]. 
The wealth of weather data available at this location makes this a pare 
ticularly feasible project. 

The authors wish to acknowledge the guidance and assistance given 


to them by Dr. J. Ro. Warde 
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APPENDIX I 
FORTRAN STATEMENTS OF ADAPTION PROCEDURES 
The Fortran statements used to simulate the four adaption procedures 
are listed below. The program was written for an Adaline having 17 
weight elements. It is to ve noted that this is not the complete program 


but only the adaption techniques. 


Abbreviations used in the program: 


ADAPT Adaption level 
BETA Increment constant 
BMSE Bound for minimum square error 
DELTA Dead zone level 
PAT(I,J) Input pattern, I=pattern element, J=input pattern 
number 
PIE Proportional constant 
SGN(J) Desired binary output of pattern J 
SUM Analog output of pattern J 
WEY(LI) Value of weight element I 
C MINIMUM SQUARE ERROR ADAPTION 


ERROR=SGN(J) *DELTA-SUM 
ABER=ABSF ( ERROR) 
IF(ABER=BMSE) 100,100,50 

50 ENORM=PIE*(ERROR/17.0) 
po 60 T= 1,17 

60 WEY(I) “WEY(L)+ENORM*PAT (I,J) 


100 CONTINUE 
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254 
255 
256 
27 


258 


259 
260 
261 


262 


304 
305 
306 
307 
310 


311 


312 


LNCAREMZNTAL ADAPTION 


IF(SGN(J)) 256,254,254 


IF(SUM) 258,258,255 
IF(SUM-DELTA) 258,262,262 
IF(SUM) 257,259,259 
LF(SUM+DELTA) 262,262,259 
SIGN=+1.0 

GO TO 260 

SIGN==>1.0 

DO 261 I=1,17 

WEY (1) =WEY( I) +S IGN*BETA*PAT(L,J) 
CONTINUE 


RELAXATION ADAPTION 

IF(SGN(J)) 306,304,304 

IF(SUM) 310,310,305 
IF(SUM+.0005=DELTA) 310,312,312 
IF(SUM) 307,310,310 
LF(SUM=-Q005+DELTA) 312,312,310 
ERROR=SGN( J) *DELTA=SUM 
ENORM=PIE* ( ERROR/ 17.0) 

DO 311 I=1,17 

WEY (I) =WEY (1) +ENORM*PAT (I,J) 


CONTINUE 








354 
355 
356 
357 


360 


361 


362 


MODIFIED RELAXATION ADAPTION 
IF(SGN(J)) 356,354,354 
IF(SUM) 360,360,355 
IF(SUM-DELTA) 360,362,362 
Ifr(SUM) 357,360,360 
IF(SUM+DELTA) 362,362,360 
ERROR=SGN(J)*ADAPT=SUM 
ENORM=PIE*( ERROR/17.0) 

DO 361 I=1,17 

WEY( I) =WEY(1I) +ENORM*PAT (I,J) 


CONTINUE 
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APPENDIX II 
EXAMPLES OF ADAPTION PROCEDURES 
An example of pattern separation was solved using each of the 
four different adaption procedures to illustrate their characteristics. 
The test patterns were: 


X 


o X ® o 0 0 9 0 a ©6 
(+1) o Xo e 0 o X o Xo X o o Xe X 
X oe X o o Xo X e Xo o o a X «6 
o 6 8 @ o 6 © @ K o X e o Xo & 
(1) (2) (3) (4) 
o o X 26 o 0 o X& ° @ 0 @ 0 0 © @ 
(-1) Xx @ xX @ e Xx e X 6 68 Xx ° o ¢o 0 Xx 
X XX o o> XXX Xo X 6 o Xo X 
eo e a 0 © 0 6 @ XX X 6 o kX XX 
(5) (6) (7) (3) 


When a pattern is imposed on an Adaline, corrections are made to the 
weights so that its analog output satisfies the training criteria. This, 
however, has a tendency to partially destroy some of the effects of 
previous training. In an attempt to illustrate this process, the analog 
output of the Adaline was examined after each adaption throughout this 
teste 

The fixed conditions of this experiment were: 

Weights: Continuously variable with initial values of 0.0 
Pattern sequence: 1, 2, 3, 4, 5, 6, 7, 8 

Proportional constant: 1.00 

Minimum square error adaption level: 30.90 

Minimum square error adaption bounds + 1.00 

Incremental constant: 1.00 

Adaption dead zones 30.90 


Adaption levels: 40.90 
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Minimum Square Error Adaption 
Number of iterations required to converge: 64 


Weight elements after convergence. 


32436 
7.864 10.197 5.141 25/73 
@/25 9.090 0228 211.528 
17.821 212.380 16.595 #=15.091 
= 0400 3.237 9/18 4.442 


Analog output of each pattern after convergence. 


Pattern Analog output 
1 29319 
2 29.2385 
3 29 2931 
4 29 6368 
3 =29 4846 
6 =29 529 
7 = 30.063 
8 =30.900 
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Incremental Adaption 
Number of iterations required to converge: 82 
Weight elements after convergence. 


4.900 
10.000 14.000 @5 000 =-52000 


@2.000 10-900 000 -14.000 
=-22.900 -14.000 22.000 #162000 


#2900 =10.900 12.900 6.000 


Analog output of each pattern after convergence. 


Pattern Analog output 
l 30.000 
2 46.000 
3 30-900 
4 30.000 
5 42.000 
6 =34.900 
7 -46.0900 
8 = 30.900 
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Relaxation Adaption 
Number of iterations required to converges 168 


Weight elements after convergence. 


3540 
7.881 10.304 252277 250918 
22/30 90141 0285 211.706 
218.106 212.534 =-16.846 215.344 
=e401 32312 =9 e327 4.516 


Analog output of each pattern after convergence. 


Pattern Analog output 
l 30-000 
Z 30-900 
3 30-000 
4, 30-000 
5 =30.000 
6 =30.2000 
. ee 230.000 
8 = 30-000 


Notes The criterion of the relaxation adaption procedure is that the 

magnitude of .the analog output for each pattern be equal to or greater 
than the adaption dead zone. In this particular example the magnitude 
of each pattern analog output after training was exactly equal to adape= 


tion dead zone. This would not normally be expected . 
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Modified Relaxation Adaption 
Number of iterations required to converges 31 
Weight elements after convergence. 


32896 
9.918 12.819 @5.197 26,787 


#12319 9.970 0057 “13-871 
020.536 ~14.23/ @20.485 16.814 


#1542 29.604 -11.569 50/35 


Analog output of each pattern after convergence. 


Pattern Analog output 
il 34.697 
2 37435 
3 30.138 
+ 30.844 
3 =-36.076 
6 =37.091 
7 =-40.000 
8 233.365 
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APPENDIX III 
MINIMUM SQUARE ERROR ADAPTION AS A FUNCTION OF PROPORTIONAL CONSTANT 
An investigation of minimum square error adaption was conducted to 
determine the effect of the proportional constant. The number of itere- 
ations required for convergence and the resultant value of the weight 
elements were examined. Fixed conditions of the experiment were; 
Adaption levels 30.00 
Bound for minimum square error: +1.00 


Weights: Continuously variable with initial values of 0.0 


Pattern sequences: Ae 1, 2, 3, 4, 5, 65 7, 8 
Be lyn Sg 728005004 7 Syne 
Ce. Random 
Training patterns 

X & J Xx 2 X e@ @ Xx co) Xx Qo e@ 3° .] ° t*] ° eo 
c+] X @ ) Q 3 Xx Go QO Xx S$ Q x Q xX 
(+1) Xo X «6 o X o X o X 0 eo o X «6 
c+) Qe Qe o ° Q ° Q 4 i?) X 9 Q@ X a Xx 

(1) (2) (3) (4) 
e Qe X e Qe i+) t+) x i“) ce] i Qo c?) ° 6 i] 
Xx . X ° e X 8 Xx Q e@ X Qe Q “] Q@ Xx 
(21) X X X eo o X X Xx Xx eo X o o X 06 X 
e e ° e e e ° e x X Xx °) Q X X K 

(5) (6) (7) (8) 
C&T XX Xo o XK X mre 0) er 
Xx ° Qe @ i“) Q Q Xx Xx ° e Xx Xx Q 9° x 
(+1) X ° ee @ . «6 6% Xo o X XK co" te X 
XXX. o X XX X X X X 0 0 6 6 

(1) (2) (3) (4) 
x X xX e ° @ Qe X Q ca) o Qo 9 Q@ X 98 
® X e e X X X Xx Xx Qo © 9 @ i] X @ 
(-1) o X 6 he eo o o X X XXX ¢ eos 
9 Xx Qo e t] Q e ° X 0 ° Q @ X x X 

(5) (6) (7) (8) 





Training patterns 


° 


) ar ae ae mA ww Xx 


X & O 


ms 9 OK 


io a 2 oS 


oo OK Ne 


° o ° 


° ° ° 


~~ ee 


o 8G 90 Uw 


~ 98 OG 


~ J 


° 


°o 


o XX X 
o Xo X 
o XXX 


XXX o 


> 
2 
bad 


XXX o 


X o X o 


> OS 

ho: iP ay 
1° @) 

> dé ww 


o a 


(6) 


XXX X 


oS) 


X XK o 


(+1) 


~*~ 


X XX X 


Vé&T 


MS OS 


co 8 am 


o 0 OG Nw 


o o 8S 


me SO 


ese o@ 8 fy, 


oo fF @© Nw 


76 OS OS 


o @ 5S 


ec 8 BEN 


eo 8 8 Nw 


~~ OS OS 


mS OX 


® °o oN 


co °@ £8 Nw 


mM O64 OX 


Gal), 


2 SE >€ © 


2 


° 


o XX eo 
o XX o 


® 9 6 X 
XXX X 


X XXX 
o XX o 


o> 
tea 


~~ Oo aN 
sé oN 


OS 


(=1) 
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NUMBER OF ITERATIONS REQUIRED TO CONVERGE FOR MINIMUM 
ERROR ADAPTION 


PROPORTIONAL CONSTANT 


PATTERN | SEQUENCE 


025 030 0/5 1-00 1.625 1250 1675 1690 





K&S 434 188 104 64 42 72 162 455 
423 182 109 66 49 71 86162 8 434 
440 200 136 96 72 99 160 392 
C&T 143. 68 37 47 35 22 «6132 = 8=6353 
149 68 46 49 41 49 123 343 
139 72 56 40 72 80 168 383 
X&O om AZo 643 45 45 69 122 360 
300 144 84 34 4] 66 145 367 
312 -J52 "96 72 72 85 143 366 
UV&T 96 55 35 a7 37 38 92 259 


97 S51 35 31 45 60 149 397 





104 56 40 45 40 72 120 280 


* denotes nonconvergence 
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RESULTANT WEIGHT ELEMENTS AFTER CONVERGENCE FOR X & J PATTERNS 


PROPORTIONAL 
CONSTANT 


025 


1290 


025 


1.00 


PATTERN 
SEQUENCE 
A 30454 
A 30436 
A 3.519 
B 30446 
B 32406 


WEIGHT VALUES AFTER 


72661 
29/37 
217.576 


=0425 


70864 
aeg/25 
017.821 


=2400 


70936 
20899 
18.137 


= 0448 


7622 
20/13 
@176575 


20394 


72670 
= 9663 
o17.932 


29299 


31 


CONVERGENCE 
10.010 2536114 
8.868 0265 

120174 216.368 
8.074 296359 
10.197 25214) 
9.090 0228 

212.380 = 16.595 
28 o23/ 992/18 
106373 50306 
9.161 0202 

212,682 °16.912 
98 6136 29.81} 
9.991 #50137 
8.869 0285 

2126162 216328 
~80059 9.548 
10.244 250334 
92199 0388 

2126542 «16.403 
986121 ~9 846 


@30/58 
211.380 
214.892 


4.389 


a5of13 
211.528 
©15.091 


4.442 


25919 
o11.658 
215.342 


4.694 


250/42 
211.352 
214,895 


4.3380 


= 6698 
112464 
2150285 


4.4/8 





al . | 
mii "Geel i? 
UH fifi 
if VPA taht ‘ih ‘1 
Pills 


= 












PROPORTIONAL 
CONSTANT 


1.90 


025 


1.90 


PATTERN 


SEQUENCE 


B 


34569 


32426 


30467 


32657 


WEIGHT VALUES AFTER 


72950 
@ 6920 
o18,.160 


20418 


70672 
20/20 
#170573 


= 389 


72803 
26/4 
217.826 


=e310 


7.762 
oe//5 
@18,095 


=0394 
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CONVERGENCE 
100375 25356 
9.144 e220 
212.729 -16.966 
-8.154 29/82 
10.099 o55116 
8.878 0259 
12.172 216.367 
28 6063 296527 
10.130 230193 
9.104 2290 
2120455 16.525 
8.210 @9./05 
10.126 @5e35/ 
9.026 02/0 
#126358 16.831 
23 6358 29 6690 


26.014 
#110739 
#156378 


4.650 


@5./40 
©11.357 
@14.878 


4.394 


@5 6853 
©112560 
215.247 


42423 


252943 
#112689 
215.239 


4.380 


i is 
uw 


: i ; 


ma Me siti 
Bite. Tiletit 
tier) i. Tytt 
A Ue rap ‘ - 


i 
| 


iin 
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COMMENTS 

1. The proportional constant strongly affects the number of iterations 
required to converge. A minimum number of iterations was required when 
the proportional constant was approximately equal to one. The adaption 
process will not converge if the proportional constant is equal to two. 
2. For each pattern pair, the final values of the weight elements were 
approximately the same regardless of the proportional constant chosen 

or the sequence of the training patterns. Only a representative sampling 
of resultant weights are included in the data, but all the results sup- 


ported the above conclusion. 
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ten patterns shown below. 


APPENDIX IV 


ADALINE RESPONSE TO SIMILAR PATTERN 


3 


An Adaline was trained, until it had converged to a solution, on the 


After convergence, 50 "similar" patterns were 


imposed upon the trained Adaline, the “similar™ patterns being generated 


by randomly changing one pattern element of the training patterns. This 


test was conducted using each of the four adaption procedures in turno 


Training Patterns 


~ Me 


am XM © 
we 


+1 Class 
eo X o ge 4 ae EP oo oc oS Go 86 oO oo oa oO 
Xo e o o ame KX 0 » Caer ear o a ome 
hs 3 ¢. 6 EK 6 me 6 oo Ae 
x ve > ae are. oo & X 
6 Qa o0UC<CiaemUrmc iCislWCULCU > rare <r 5) aa 
(1) (2) (3) (4) 
@] Class 
ot. o o om, x o 6 66 MM o 6: jo ol re 
eK o be: ae) were ¢ Meee 6 oo 8¢ Oo 9 
cx s 6 ER. Gee 0. ome co oO 9o8 9 lO 
XX o oo Re Kak, x Keen 6 pe aa 
tLe SO * BOG XX XX o ae a oe 8 
(6) (7) (8) (9) 


The fixed conditions of the experiment were: 


Weights: Continuously variable with initial values of 


Pattern sequences 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 
Proportional constants 1.00 

Minimum square error bound: + 1.00 

Adaption dead zones 30.00 

Incremental constant: 1.00 


Adaption levels 40.00 
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Adaption 
Procedure 


Minimum 
Square 
Error 


Incree 
mental 


Relaxation 


| Modified 
Relaxation 





103 


29 


99 


19 


Se geet | 


Number of 
Iterations 





Pam me ar 


Range of Analog 


| Range of Analog | 
Outout for Similar 





Output for Train- 


ing Patterns 


+ 


30.128 
296587 


622000 
36.000 


42.274 
30 6000 


52.642 
30.534 
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A tr 


290076 
30.837 


36.000 
52-000 


350249 
46.580 


58.000 
18.000 


43564 
Vell] 


550/08 
170156 








19268 
38 6684 


18 6000 
64.000 


17.099 
49.652 


170367 
3/823 





COMMENTS 

The range of the analog outputs for the similar patterns is deo 
pendent upon the final trained values of the weight elements. In this 
example, one pattern element was changed in the generation of each simic 
lar pattern and this had the effect of changing the sign of one correse 
ponding weight element. Thus the largest change, in this case, would 
occur when the pattern bit corresponding to the largest weight element 
is changed in sign. 

All of the analog outputs of the similar patterns were of the same 
sign as the desired binary output. However, if similar patterns had 
been formed by changing more than one pattern element, it is possible 
that some of the similar analog outputs would be of a different sign 


than the desired pattern binary output. 
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APPENDIX V 


THE RESPONSE OF AN ADALINE TO TRAINING PATTERNS THAT ARE NOT LINEARLY 
SEPARABLE 


Two pattern classes were chosen so that they were not linearly sepac 
rable. This was accomplished by assigning the same pattern to both the 
(+1) and the (°l1) class. There were a total of 100 input patterns, each 
of which was composed of nine pattern elements. The training patterns 
used in this experiment are tabulated in [ale 

The experiment consisted of two parts, the first consisting of an 
attempt to Separate the two pattern classes by imposing all the training 
patterns on the Adaline. This was performed for both 500 and 5000 itere- 
ations. Second, a random sample of ten patterns was chosen and Adaline 
was trained on these patterns for 200 iterations or until it had converged 
to a solution. Then the remaining 90 patterns were imposed upon the 
trained Adaline. The fixed conditions for this experiment are the same 
as those listed in Appendix IV. 

The results are listed in the following table. For this experiment, 
a pattern was defined to be not correctly identified if its analog output 
was either of an opposite sign to the desired pattern binary output, or 


if the analog output was zero. 


oy 
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Number of | Number of Mino Sqo Inco 
Training Iterations Error 
Patterns 
| 100 500 13 ~ 
100 | 3000 13 2 
10* | 200 ii 9 
10* 200 14 10 
10* 200 14 9 
10* 200 22 Li 
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Number of Patterns Not Correctly Identified 


ee 





Adaption Procedures 


| Relaxation 





Different random samples of ten patterns. 
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APPENDIX VI 
NUMERIC RECOGNITION TESTS 

A computer simulated Madaline consisting of four Adalines was eme 
ployed to attempt the separation of the ten numeric characters into the 
ten different classes, zero through nineo Ten hand written digits (one 
set) were obtained from each of five persons, and were converted into 
patterns in. a seven by seven input space. The computer program was 
written in Fortran and executed on a CDC 1604 digital computer using the 
Minimum Square Error adaption scheme with a proportionality constant of 
one and an adaption dead zone of 30. The following output coding scheme 


was used in this investigation: 


Digit Adaline li Adaline 2 Adaline 3 Adaline 4 
0 l l 1 1 
l o] ol] o] 1 
2 1 ol L ol 
3 1 1 i o] 
4 o] o] 1 1 
) 1 l o} o] 
6 ol 1 1 | 
7 1 ol] o] ol 
8 1 l ol 1 
9 oj i oj 1 


A series of tests were run in which one, two or more sets of 
patterns were used to train the Madaline, after which the Madaline was 
asked to recognize and classify all fifty input patterns. The number 
of training iterations performed prior to the classification check, and 


the number of digits recognized out of the total of fifty imposed on 
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the system are included in the data tables below. 


— i ele 
: Noo Of patterns | No. of training | No. of patterns i Noo of patterns 





trained on | iterations checked classified 
l set of 10 | 4 iterations 36 

per pattern 
2 sets of 10 4 iterations 36 


per pattern 


3 sets of 10 4 iterations 50 Ld 
per pattern 


4 sets of 10 4 iterations 50 45 
per pattern 


5 sets of 10 4 iterations | 50 30 
per pattern | | 


No. of patterns | No. of training | No. of patterns No. of patterns | 
trained on iterations checked | classified 
























il set of 10 200 total 
RKeerations 

2 sets of 10 200 total 
iterations 

3 sets of 10 200 total 









iterations 













200 total 
iterations 


oS 


50 | 50 










200 total 
iterations 


5 sets of 10 





The results of this test show that the system must be trained on 
all patterns to insure that it will be capable of classifying all the 
patterns. However, training on just one set gave the system the Capa 
bility of recognizing many of the other patterns submitted to it for 
Classification. Of interest is the suggestion that repeated iterations 
do not necessarily improve the ability of the system fo classify the im» 


posed inputs. 
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APPENDIX VII 
FORTRAN SYMBOL CLASSIFICATION 

The sole purpose of this test was to determine whether or not 45 
hand printed Fortran symbols could be Separated by a computer simulated 
Madaline consisting of the minimum number of Adalines that could theoe 
retically accomplish this task. Six Adalines, each with a seven by 
seven input space, were utilized to accomplish the desired separation 
after the hand written characters had been converted into input patterns. 
The computer program was written in Fortran and executed on a CDC 1604 
digital computer using the Minimum Square Error adaption technique. A 
proportionality constant of one and an adaption dead zone of 30 were 
used for this teste. 

After extensive manipulation of the Madaline output coding schemes, 
the 45 characters were properly classified. Nine thousand training 
iterations were executed prior to the classification of the 45 input 
patterns. The 45 Fortran symbols and the output coding scheme which 
accomplished the desired classification are as follows: 


Pattern Adaline Adaline Adaline Adaline Adaline Adaline 


One Two Three Four Five Six | 
0 1 1 1 1 ol] 1 
1 l 1 1 1 ol 1 
2 ol 1 ol J ol 1 
3 ol 1 o] of 1 it 
4 1 1 1 1 l el 
3 1 1 1 ol ol o} 
6 a] 1 ol 1 1 l 
7 1 1 1 ol] ol 1 


6} 


Pattern Adaline Adaline Adaline Adaline Adaline Adaline 


One Two Three Four Five Six 
8 l l o] l 1 ol] 
9 1 1 ol ow] } 1 
A ol , l i ol l J 
B 1 o} 1 1 1 i 
C ol] if 1 ol ol 1 
D 1 ol] 1 1 o] 1 
E 1 o] 1 1 1 =} 
F 1 o} 1 ol 1 o} 
G ol] 1 ol l o] ol 
H 1 el 1 ol ol ol] 
I ol 1 l 1 1 1 
J 1 l ol] o} 1 1 
K l ol] 1 1 ol ol 
L 1 ol 1 a} ol i 
M 1 1 l 1 ol ol 
N 1 =] ol] 1 ol i 
0 l 1 1 l 1 1 
P 1 2] l ol} 1 pt 
Q o] 1 =] 1 1 ol} 
R 1 it 1 o] 1 ol 
S ol] 1 1 1 1 o | 
if 1 1 1 ol] 1 1 
U 1 =] ol] 1 1 ol} 
V ol | ol o] ol] ol 
W ol] 1 l ol 1 =} 
X o] 1 l ol o] ol 
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Pattern Adaline Adaline Adaline Adaline Adaline Adaline 


One Two Three Four Five Six 
Y o} 1 1 i ol} =] 
Z ol} o} l of ol ol] 
+ l 1 ol] l 1 1 
= ol] ol] 1 o} i 1 
/ ol ol 1 L ol a} 
( of} =} 1 ol 1 2} 
) ol] a] 1 1 ol] i 
9 ol ol ol] =] it ol 
° ol] =] ol] ol ol 1 
= ol ol ol I o] i 
ve | | ol i 1 i 


The final output coding scheme was obtained mainly by trial and 
error methods. However, an effort was made to uSe certain characteristics 
of the individual input patterns to facilitate the desired classification. 
Specifically, at least one Adaline was trained to produce the same output 
for each of the following sets of input patterns; 


a) Patterns with a long vertical line in the left 


hand column. Example: E,L5P 
b) Patterns with other long vertical lines. Examples 4,I1,T 
c) Patterns with a long horizontal line. Examples E,H,I 
d) Patterns with large circles Examples: 0,Q 
e) Patterns with small circles Examples: 8,9,P 
f) Patterns with small horizontal lines Examples A yt <2 
g) Patterns with small vertical lines Examples 5,+ 
h) Patterns with left to right slant lines Examples N,V,5X 
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i) Patterns with right to left slant lines Examples K,Z,/ 


j) Patterns which were smaller than the others Examples »,° 
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